Omniscopes: Large Area Telescope Arrays with only N log N Computational Cost 
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We show that the class of antenna layouts for telescope arrays allowing cheap analysis hardware 
(with correlator cost scaling as A'' log N rather than A^ with the number of antennas A) is encour- 
agingly large, including not only previously discussed rectangular grids but also arbitrary hierarchies 
of such grids, with arbitrary rotations and shears at each level. We show that all correlations for 
such a 2D array with an n-level hierarchy can be efficiently computed via a Fast Fourier Transform 
in not 2 but 2n dimensions. This can allow major correlator cost reductions for science applications 
requiring exquisite sensitivity at widely separated angular scales, for example 21cm tomography 
(where short baselines are needed to probe the cosmological signal and long baselines are needed for 
point source removal), helping enable future 21cm experiments with thousands or millions of cheap 
dipole-like antennas. Such hierarchical grids combine the angular resolution advantage of traditional 
array layouts with the cost advantage of a rectangular Fast Fourier Transform Telescope. We also 
describe an algorithm for how a subclass of hierarchical arrays can efficiently use rotation synthesis 
to produce global sky maps with minimal noise and a well-characterized synthesized beam. 



I. INTRODUCTION 

There is now strong community interest in building 
more sensitive radio telescopes, stemming from diverse 
science opportunities that range from planets to pulsars, 
from black holes to cosmology [1]. However, greater sen- 
sitivity requires greater collecting area, which in turn in- 
creases cost. The cost for a steerable single-dish tele- 
scope grows faster than Hnearly with area, and becomes 
prohibitive beyond a certain point, which has bolstered 
interest in interferometers. Interferometer arrays can be 
made arbitrarily large, but for a generic array layout, the 
cost unfortunately grows quadratically with area asymp- 
totically, eventually becoming financially unviable. The 
reason for this is that for an array of N antennas, all 
N{N —l)/2 05 N"^ pairs of antennas need to be correlated 
to calculate the so-called visibilities which encode the sky 
image. Thus the cost of the hardware performing these 
computations scales as N"^, dominating all other costs of 
the interferometer (which tend to grow linearly) for suffi- 
ciently large N. An array of A^ 10^ cheap dipole-style 
antennas has the potential to greatly improve constraints 
on dark matter, dark energy, inflation, reionization, etc. 
[2-7] , and the A^^-component of the cost starts dominat- 
ing already around A^ ~ lO'^ [8]. 

To overcome this limitation, two cost-cutting ap- 
proaches have emerged: 

1. Partitioning the array into "tiles" of M antennas 
apiece, each treated as a single element as in [8-11]. 
This cuts the correlation cost from A^^ to {N/MY, 
at the price of reducing the sky area covered by a 
factor M. In other words, the time savings entirely 
come from throwing away available sky information 
and thus not needing to compute it. 

2. Arranging the antennas into a rectangular grid that 



can be correlated using Fast Fourier Transforms 
(FFTs) [12-18]. This reduces the computational 
cost from A^^ to A^ log2 N, but by lacking long base- 
lines between antennas, a fully sampled square grid 
provides much lower resolution than a conventional 
sparse array. 

The goal of the present paper is to explore what class 
of antenna layouts permit cheap (A^ log2 A^) signal pro- 
cessing without throwing away any information. We will 
show that this class is encouragingly large, including not 
only the above-mentioned rectangular FFT grids, but 
also arbitrary hierarchies of such grids, such as the ex- 
ample shown in Figure 1. This opens up the possibility 
of getting the best of both worlds, combining affordable 
signal processing with baseline coverage tailored to spe- 
cific scientific needs. We will refer to such an array that 
is effectively omnidirectional and omnichromatic as an 
omniscope^ . 

The computational savings of this type of arrays comes 
at a price. By construction, the array instantaneously 
measures a significantly smaller number of visibilities (of 
order A^ rather than A^^). As we will discuss later, this 
drawback can be compensated at least partially by sky 



^ If the electric field is digitized at set of antenna elements with 
broad primary beams (like those of the MWA [8] , PAPER [10] 
or 21CMA [11], say) and these antennas are correlated individu- 
ally rather than in tiles, then the instantaneous field-of-view will 
be much of the solid angle above the horizon, and the spectral 
coverage in principle covers a large fraction of the range from 
zero up to the Nyquist frequency, limited only by antenna and 
feed response, analog filtering requirements, etc. The designation 
"telescope" feels like a misnomer for such an instrument, since it 
is not zooming in on a distant object subtending a small fraction 
of the sky. 
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frequency, say) in a rectangular array, and convolves this 
array with a parity-reversed copy of itself, obtaining an 
image of what radio astronomers refer to as "visibilities" 
in the "wi; plane". Mathematically, this is equivalent 
to performing an FFT, computing its square modulus, 
and performing an inverse FFT. Each visibility is sim- 
ply the correlation of signals from antennas separated by 
the corresponding distance vector (known as a baseline) 
summed over all pairs with the same separation. Such 
uv plane observations made at different times can then 
be combined to average down instrumental noise. For 
any given pointing, only a fraction of pixels in the uv 
plane will be measured, but Earth's rotation can be ex- 
ploited to fill in missing baselines. This final uv plane 
image is essentially the Fourier transform of the sky — 
[18] includes the complications of polarization and sky 
curvature. 



FIG. 1: Example of a 3-levcl hierarchy of antennas, with 5x3 
blocks arranged in 3 x 3 blocks that are in turn placed in a 3 x 3 
block. Note that the blocks at each level can be non-square 
(like at level 1), sheared (like at level 2) and rotated (like at 
level 3), in any combination. We show that this particular 
array can be efficiently correlated with a 6-dimensional FFT. 



rotation. Furthermore, this reduced number of instanta- 
neous visibilities implies an enormous redundancy: each 
visibility is independently measured by order N differ- 
ent antenna pairs, providing an opportunity for vastly 
improved calibration and systematics control. 

The rest of this paper is organized as follows. In Sec- 
tion II, we describe our hypercube algorithm for ana- 
lyzing arbitrary multi-level arrays in TV log2 N time. In 
Section III, we provide examples of how the sky can be 
probed with arrays in this class. We summarize our con- 
clusions in Section IV. In Appendix A, we describe how 
multiple measurements from such arrays can be rapidly 
merged into sky maps of the full curved sky. 



II. THE HYPERCUBE ALGORITHM 

A. Background 

For a modern introduction to radio astronomy, see 
e.g. [19]. A brief and self-contained description of how 

a rectangular array of N antennas can map essentially 
the whole sky above the horizon at N log2 N computa- 
tional cost can be found in [18], without using any of the 
approximations that are commonly used in radio astron- 
omy. One arranges data from each antenna (at a given 



B. Hierarchical grids 

A more complicated sparse array layout such as the 
one in Figure 1 can of course be analyzed with this same 
formalism if the FFTs used for the correlation are per- 
formed on a square grid where each antenna falls in a 
single entry or pixel, but where many of the entries are 
empty. This would be as time consuming as analyzing a 
non-sparse (fully filled) grid, thus taking no advantage of 
the sparseness. In the example in Figure 1, this proce- 
dure would increase of the computational cost by a factor 
of ~ 8, and this inefficiency factor would be much larger 
for arrays involving some very long baselines. 

Fortunately, there is a much better way: simply ar- 
range the data from Figure 1 in a 6-dimensional hyper- 
cube of dimensions 5x3x3x3x3x3, and perform the 
correlation in 6 dimensions using 6-dimensional FFTs. 
We prove below that this gives the exact same result, 
and this computation is clearly much faster: it scales as 
N log2 N where N is not the total number of pixels in 
Figure 1, but only the number of pixels containing an 
antenna. 



C. A warmup example 

Before getting rigorous, let us consider the following 
simple toy example to clarify the situation. Suppose we 
have a simple one-dimensional array consisting of A'' = 6 
antennas arranged in 2 groups of 3, located along the x- 
axis at positions Xi = 1,2,3,7,8,9 in some units. The 
corresponding possible separations are all integers —8 < 
Ax < 8 except Ax = ±3. Suppose moreover that in some 
appropriate units, they happen to measure voltage val- 
ues fi = 1,2,3,4,5,6. Arranging the voltages measured 
along the x-axis in a vector f = (1, 2, 3, 0, 0, 0, 4, 5, 6), we 
now wish to compute the convolution of f with its par- 
ity reversed version f ~ = (6, 5, 4, 0, 0, 0, 3, 2, 1) to obtain 
the visibility vector g = f ★ f ~ , where ★ denotes convolu- 
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tion. This visibilities vector will contain the product of 
the voltages of antennas separated by a given distance, 
summed over all pairs with the same separation. For an 
n-dimensional vector f , this convolution is defined by 

min{n,z+n} 

5,^(f*f-),^ J2 fjfj-i, (1) 

j=max{l,z+l} 

where i = —{n — 1), ...,n—l {i corresponds to differences 
between the integers j = 1, 

There are now three equivalent ways in which we can 
compute this. The slowest of all would be to simply com- 
pute all the products and sum up the ones corre- 
sponding to the same separation. The other two rely 
on doing FFTs, in either a slower or a faster way. The 
slower would just perform a zero-padded FFT, squaring 
the result and inverse transforming. The result is 

g = (1 2300045 6) ★(6 5400032 1) 

= (6 17 32 23 12 27 58 91 58 27 12 23 32 17 6) . 

However, by instead arranging the measured data in a 
2D array, we can obtain the same answer by performing 
a 2D convolution exploiting 2D FFTs: 

/I o Q\ /RK A\ / 6 17 32 23 12\ 

The latter is faster, since it eliminates all "multiply by 
zero" steps stemming from the sparsencss of the ID array. 
The sparser the array, the greater the speedup. Note that 
there are two copies of all but the middle number in the fi- 
nal convolution, which is effectively a palindrome. This is 
of course because the inputs are real- valued; by perform- 
ing an real FFT, one avoids computing these numbers 
twice. 



D. The general case 

Let us now investigate why this trick works and gen- 
eralize it. We define a hierarchical grid as a set of points 
of the form 

r = iiSLi + 12^2 + (3) 

where ii, Z2, ... are integers spanning some finite ranges 
and ai, a2, ... are vectors. We will be mainly interested in 
the case where these are 2-dimensional vectors, but our 
algorithm works for any dimensionality. We will insist 
that the integer ranges be such that each point r in the 
grid corresponds to a unique set of integers. Our toy ex- 
ample above corresponds to the special case ai = (1,0), 
SL2 = (6,0), ii = 1,2,3 and 13 = 0,1. The 3-lcvcl hi- 
erarchical grid in Figure 1 corresponds to ai = (1,0), 
a2 = (8,0), as = (40,10), a4 = (0,1), as = (2,8), 
ag = (—10, 40), ni = 1, 5 and 712 through n% all rang- 
ing from 1 to 3. 



Following the notation from Section II C above, we let 
/r denote the voltage measured by the antenna at posi- 
tion r (at some time, in some frequency band) and wish 
to compute the visibility map g = f ★ f ~ . Using equa- 
tion (3) and the definition of convolution, we obtain 

9t = fi'j-|-nai-|-i2a2-|-... — ^ ^ /r'/r'-r = 

r' 

~ /j'-|-iiai-|-i^a2-|-.../(j'-Fiiai-|-i^a2-|-...)-(j-|-iiai-|-i2a2-F...) 
i'i'A — 

~ fi{aL-i_+i'^a.2+..J-i+(i'i-ii)a.i+{i'2-i2)aL2+... (4) 

■/ ■/ 

On the first line, we have included a residual vector 
j = r — (iiai -|- i2a2 -|- ...) to allow for the fact that r 

may not lie on the grid. We have introduced an anal- 
ogous vector j' on the second line to ensure that the 
sum runs over all r'-values, not merely those in the grid. 
Since our measurements / vanish off the grid, we have 
/j'+i'^ai+i^a2+... = whenever j' 7^ 0, so that we can set 
j' = and sum only over «']^,«2,.... Rearranging terms 
on the third line, we see that the second factor now 
vanishes whenever wherever j ^ 0, which means that 
gr equals zero outside the grid and need only be com- 
puted on the grid. Defining Fi^i^,,, = /j^ai-i-i2a2-i-... and 
Gi^i^,,, = g'iiai4-i2a24-...i ^e Can therefore rewrite equa- 
tion (4) as 

Giii2- = X] ^i'ii'2^(i'i-ii),(i2-i2),-' i^) 

which we recognize as simply a multidimensional convo- 
lution. Note that if an index of F runs from 1 to n, the 
corresponding index of G runs from — (n — 1) to (n — 1). 

This result is very general: the proof assiimes only that 
the mapping from r to a set of integers is unique, so grid 
points are even allowed to be interleaved between differ- 
ent levels of the hierarchy as long as they never collide — 
consider for example ai = (5,0), a2 = (7,0), ii = 1,..4 
and (2 = 1,...,4. Moreover, the proof is readily gener- 
alized to the case where the components of r are not 
integers or even rational numbers. 

III. EXAMPLES 

Because of the hypercube data analysis algorithm de- 
scribed above, the class of antenna layouts for affordable 
large-area radio telescope arrays is quite large. In this 
section, we illustrate the possibilities with some simple 
examples. 

Figures 2-5 show four hierarchical grid omniscope lay- 
outs that we nickname "3-level" (Figin-e 2), "Blocks" 
(Figure 3), "Plank" (Figure 4) and "Strips" (Figure 5). 
In all cases, separations are in units of the minimum 
spacing between antennas, which is our examples is 1 
meter. For comparison, we have also analyzed four non- 
hierarchical layouts for which the computational cost 
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Baseline [meters] 
FIG. 2: The antenna layout (top left) from Figure 1 convolved 
with its parity reversal gives the two-dimensional baseline dis- 
tribution (top right), and binning this radially gives the one- 
dimensional baseline distribution (bottom), which is plotted 
on a linear scale. The layout corresponds to ai = (1, 0), a2 = 
(8,0), as = (40,10), a4 = (0,1), as = (2,8), as = (-10,40), 
ni = 1, 5 and n2 through ne all ranging from 1 to 3. 



Baseline [riiclcrs] 
FIG. 3: Antenna layout (top left), 2D baseline distribution 
(top right) and ID baseline distribution (bottom) for the hi- 
erarchical "Blocks" layout consisting of four widely separated 
16 X 16 antenna blocks (ai = (1, 0), a2 = (200, 0), as = (0, 1), 
a4 = (0, 200), ni = ng = 16, na = n4 = 2). 




scales as N'^: these are 500-tile simulations described 
in [20] for the MWA-experiment, using random antenna 
placements with a radial density scaling as r*' (uniform), 
r~^, and r~^, respectively. The example is illus- 
trated in 2D in Figure 6. When plotting the 2D baseline 
distribution d, we convolved the layout map / with its 
parity reversal, d = I -k , where the layout map just 
contains one in every entry. We have zeroed the origin 
and chosen a grey-scale where the intensity is propor- 
tional to sinh~^(4(i/c?max) to avoid saturation and allow 
decent legibility at both small and large values of d. Note 
that the different examples have a different total number 
of antennas. 

In Appendix A, we will discuss the issue of how these 
baseline distributions can be filled in by exploiting Earth 
rotation. For our present discussion, we will focus on 
radially binned versions of these baseline distributions, 
which show how much information is measured on differ- 
ent angular scales. 

The first point illustrated by these examples is that a 
wide variety baseline distributions can be achieved with 
a hierarchical antenna grid — which is hardly surpris- 
ing given how large this class of grids is. Figure 7 com- 
pares the four hierarchical grid examples (bottom panel) 
with the four simulated MWA examples (top panel), all 
rescaled to have a 1 km maximum baseline. Because our 
different examples have different number of antennas, the 
curves have been arbitrarily rescaled in the vertical di- 
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Baseline [riielci's] 
FIG. 4: Antenna layout (top left), 2D baseline distribution 
(top right) and ID baseline distribution (bottom) for the 
"Plank" layout consisting of a single 512 x 32 antenna block, 
(ai = (1,0), aa = (0, 1), m = 512, na = 32). 

rection for legibility. Thus only the shapes of the plotted 
curves are important, not the amplitudes. We see that if 
one has a preference for one of these MWA-distributions, 
then faster-to-analyze hierarchical grids can reproduce at 
least some of their broad-brush features: note the similar- 
ities between "r~^ and "Plank" , and between "Uniform" 
and "Strips". 

In addition, we see that hierarchical grids can easily of- 
fer quite different types of baseline distributions as well. 
For example, "Blocks" is seen to provide sensitivity on 
widely separated angular scales, which may be be desir- 
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Baseline [rrieLers] 
FIG. 5: Antenna layout (top left), 2D baseline distribution 
(top right) and ID baseline distribution (bottom) for the 
"Strips" layout consisting of 32 separated rows of antennas 
(ai = (16, 0), aa = (0, 1), m = 32, n2 = 512). 

able for some science applications (like 21cm tomography, 
where short baselines are needed to probe the cosmolog- 
ical signal and long baselines are needed for point source 
removal) . 

If one drops the requirement of strictly optimal 
(TV log A^) computational cost and is willing to stomach 
an extra factor of two, then building a second completely 
separate omniscope can offer interesting advantages. For 
example, if two omniscopes with the "Plank" layout of 
Figure 4 are built rotated by 90° relative to one another, 
then full rotation synthesis can be obtained in merely 6 
hours instead of 12, conveniently carried out during a 
single night. Note that in this way, one computes only 
correlations within each "Plank" and as a result the one 
gathers just half the available information. Even if day- 
time observations are acceptable, this greatly improves 
the uv coverage in generic sky directions from generic lat- 
itudes. In the limit where the width of the Plank shrinks 
to a single antenna, this is similar in spirit to the cylinder 
telescopes designed by [16, 17]. (Note that if two cylin- 
ders are not parallel (as in the Mills Cross telescope [21]), 
then correlation obviously requires the full N'^ cost, since 
there are ~ N'^ separate baselines to compute.) 

In the same spirit, another interesting variant [22, 23] 
is to use FFT-correlation separately in a set of (M/N) 
different rectangular sub-arrays ("tiles") of AI anten- 
nas each that arc translated relative to each other 
by arbitrary amounts, and correlate the corresponding 
synthesized beams between tiles using the brute-force 
{N /MY approach, giving a total computational cost of 
{M\ogM){N/Mf ~ {N\ogN){N/M). This is an in- 
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Baseline [meters] 
FIG. 6: Antenna layout (top left), 2D baseline distribution 
(top right) and ID baseline distribution (bottom) for a the 
simulated MWA design from [20] with r~'^ antenna density. 

termediate case cheaper than the straight iV^ and more 
expensive than an N log N omniscope because it exploits 
the FFT speedup for two hypercube dimensions, within 
but not between tiles. 

Our examples above assumed no errors in the antenna 
positions relative to the hierarchical grid. The question of 
how antenna position errors affect calibration and image 
reconstruction deserves further work. A first exploration 
of this problem can be found in [24] , showing how calibra- 
tion errors scale linearly with such position errors, and 
how this scaling can be improved to quadratic or better 
with more elaborate software. 



IV. CONCLUSIONS 

We have shown that the class of antenna layouts for 
radio telescope arrays allowing cheap analysis hardware 
(with correlator cost scaling as NlogN rather than N"^ 
with the number of antennas N) is encouragingly large, 
including not only previously discussed rectangular grids 
but also arbitrary hierarchies of such grids, with arbitrary 
rotations and shears at each level. 

This hierarchical grid layout can allow major correlator 
cost reductions for science applications requiring sensitiv- 
ity at widely separated angular scales, for example 21 cm 
tomography (where short baselines are needed to probe 
the cosmological signal and long baselines are needed for 
point source removal). For example, the computers for 
calculating the ^ N"^ correlations constitute (by design) 
about half the hardware cost of the 512-element MWA 
telescope [8] , so for the significantly larger iV- values that 
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FIG. 7: The baseline distributions from four simulated MWA 
layouts with A'^^ computational cost (top) are compared 
with baseline distributions from four hierarchical grids with 
NlogN computational cost (bottom). For improved legibil- 
ity, all curves in this figure have been boxchar smoothed with 
a sliding bin of width 5 meters to reduce noise. 



with detailed simulations that such calibration based on 
redundant baselines calibration with an omniscope can 
be made both accurate and computationally feasible.^ 

In summary, there is now strong community interest 
in building more sensitive radio arrays consisting of large 
numbers of cheap dipole-like antennas, and our results 
indicate a potential for doing so with more bang for the 
buck. 
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will be needed to perform precision 21 cm cosmology 
[5-7] , computing hardware will completely dominate the 
budget of any non-hierarchical array. Looking towards a 
future with more sensitive instruments, hierarchical grids 
can therefore be used both to cut the costs for a fixed col- 
lecting area and to boost the area attainable with a given 
budget. 

If a science goal requires a good approximation to a 
particular baseline distribution (in ID or 2D), one can 
use simulated annealing or another suitable tool to solve 
the nonlinear optimization problem of where to place the 
antennas, either within the hierarchical grid class of lay- 
outs or with complete freedom. In most cases, general 
non-hierarchical grids can obviously provide the most ac- 
curate approximation, but at a much higher price. It is 
therefore worthwhile to start paying closer attention to 
what baseline distributions one actually needs for various 
science applications, and to quantify whether the case for 
non-hierarchical layouts is compelling enough to justify 
the extra cost. By definition, a hierarchical grid will fill 
a smaller fraction of the uv plane per snapshot (concen- 
trating all its sensitivity measuring or order N rather A^ 
baselines), but as described in Appendix A, this can in 
many cases be made up for by rotation synthesis. More- 
over, the massive uv redundancy, whereby most baselines 
are independently measured by many antenna pairs, of- 
fers a powerful tool for internal calibration and system- 
atic error control of an omniscope. In [24], it is shown 



^ Our examples in Section III assumed no errors in the antenna 
positions relative to the hierarchical grid. The question of how 
antenna position errors affect calibration and image reconstruc- 
tion deserves further work. A first exploration of this problem 
can be found in [24] , showing how calibration errors scale linearly 
with such position errors, and how this scaling can be improved 
to quadratic or better with more elaborate software. There are 
also a number of issues for which the FFT-based correlation does 
no better and no worse than traditional N'^ correlation; once the 
baseline distribution is fixed, our hypercube algorithm is after all 
just a mathematical trick for computing the exact same numbers 
as with the A''^ method, just faster. Such issues include sidelobes 
due to spatial undersampling, ionospheric modeling [25-27] , and 
direction-dependent gain calibration. The first can be mitigated 
by planing antennas less than a wavelength apart at the lowest 
level of the hierarchical grid. The above-mentioned redundancy 
can help also with direction-dependent gain calibration if there 
are point sources with known positions that completely domi- 
nate the signal in their frequency band (like the ORBCOMM 
satellites), and for the wavelengths most relevant to 21cm to- 
mography, the ionosphere only becomes a problem for baselines 
above the kilometer scale. 
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Appendix A: Rotation synthesis and mapmciking 
with hierarchical omniscopes 

Above wc described how a hierarchical omniscope 
could be correlated at an A^logA^ computational cost 
per snapshot. In this Appendix, we will discuss how, 
in the spirit of the standard radio-astronomy technique 
known as faceting [19], such snapshots can be combined 
into a seamless sky map with minimal noise and a well- 
characterized synthesized beam, again at a modest com- 
putational cost. 

1. Optimal mapmaking 

The theory of how to make maps with minimal noise 
has been extensively developed and applied in the context 
of CMB observations (see for example [28])^. Consider a 
set of n observations which we arrange in a vector y (in 
our case these are the measured correlations, N{N — l)/2 
visibilities per snapshot in time). We will assume that y 
is of the form 

y = Ax + n, (Al) 

where x is an m-dimensional vector with the intensity 
(and polarization) of the sky in different directions, A is 
a known matrix encoding the response of the instrument 
and n is the noise. Our goal is to recover x in a way 
that does not lose any of the cosmological information. 
There are many possible linear lossless estimators of x, 
because if one multiplies any lossless estimator by an in- 
vertible matrix, the resulting vector also contains all the 
available information. As discussed in [28], these lossless 
estimators are of the form 

X = MA*N- V, (A2) 

where M is an m, x m matrix. A possible choice of M 
is M = [A'N^^A]^^ which makes the estimator x unbi- 
ased, but there are also other popular choices with dif- 
ferent attractive features. Note that x has size m, the 
size of the pixelized sky map. The y- vector contains all 
the observed visibilities and thus in general has a size 
much greater than m. The multiplication by A* is the 
step that accumulates or bins the observations into a sin- 
gle map. Equation A2 constitutes a form of lossless data 
compression. 

2. The key challenge: rapid multiplication by A* 

For the purpose of our discussion, the crucial point is 
being able to compute x or a good approximation to it 



^ For a discussion of the related topic of how to go straight from 
snapshots to power spectrum estimates, see [29] 



fast, in no more than of order A'^ log N operations which is 
what is needed to compute all the correlations in y. If we 
assume that the noise in each visibility is uncorrelated, 
then the noise matrix N is diagonal and multiplying by 
is computationally cheap. The key challenge is then 
being able to apply A* to the data in an efficient way. 
This requires that A be a sufficiently sparse matrix''^. Of 
course one has the choice of representing the vector x 
is any basis, for example one could pixclizc the sky in 
a standard fashion with one pixel per direction on the 
sky or one could represent the sky by its Fourier modes. 
The same is true for y. We will attempt to exploit this 
freedom to choose a basis in which A is maximally sparse. 
For simplicity, we will discuss only the unpolarized case 
below, as polarization does not change the problem in 
any fundamental way. 



a. The problem with real space 



The most straightforward choice would be pixclize the 
sky in angular space, with each entry in x corresponding 
to a different direction in the sky. For a given snapshot, 
a row in A then combines the intensity on the sky to 
give the response of a visibility, while each row of A* 
combines all the visibilities into an estimate of the sky in 
a given direction. Defined in this way. A* is not sparse 
or compact because visibilities have response everywhere 
within the primary beam. 

We can improve the situation by changing basis for 
y. We could use an FFT to go from the measured vis- 
ibilities in the uv plane to angle space (what is usually 
referred to as the "dirty map" in the radio astronomy 
literature) in iV log N operations. Now both y and x are 
in angle space. With this choice, the matrix A is noth- 
ing but what is usually called the instantaneous synthe- 
sized beam of the interferometer (the Fourier transform 
of the 2D baseline distributions shown in the top right 
panels of Figures 2-6). Unfortunately, for generic array 
layouts, the instantaneous synthesized beam tends to be 
quite complicated, with important side lobe structure, 
positive and negative regions, etc. Although better than 
before, the A matrix is therefore still not very compact, 
so multiplying the data by A* would dominate the com- 
putational costs, spoiling the advantage of the omniscope 
design. 



^ We will assume that one can also multiply by the M-matrix 
sufficiently fast; this can often be done with an iterative approach 
once the A*-multiplication is rapid, as successfully demonstrated 
by the WMAP team for their microwave background analysis[31]. 
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b. The problem with Fourier space 

If the omniscope is located at the South Pole, then 
what to do is rather trivial: one simply aggregates all 
correlation measurements in a single uv plane. That is 
to say, we use the Fourier basis to describe x. For each 
snapshot in time, an observed visibility is the convolution 
of the Fourier transform of the sky with the Fourier trans- 
form of the primary beam. The baselines probed by each 
antenna pair simply rotates with Earth. In this case, the 
entries of the A matrix are given by the Fourier trans- 
form of the primary beam, which can be made very com- 
pact, so the A* multiplication is computationally cheap. 
One accumulates the observations in the uv plane, and 
after 12 hours, one can then Fourier transform the re- 
sulting intensity to recover a map of the Southern half 
of the sky in an easy-to-interpret projection [18] . Fourier 
transforming the corresponding uv weight map recovers 
the effective synthesized beam. 

Unfortunately, for a general array layout, it is well- 
known that there are no easy solutions for mapping a 
large solid angle [19], because of the complication that 
the plane of the array rotates over time. One cannot 
simply accumulate observations in the same uv plane. 
Although instantaneously the uv plane is clearly the best 
choice, the uv plane is defined relative to the interferome- 
ter rather than fixed to the sky, so it is not a good basis to 
parametrize the sky once long observations are involved. 

c. A solution: multiple uv planes 

There is, however, a class of omniscope designs for 
which the global mapping problem can be easily solved 
for an arbitrary observing location on Earth: the case of 
a hierarchical grid where the lowest level of the hierarchy 
consists of square nxn antenna blocks that are fully sam- 
pled (with the individual antennas of order a wavelength 
apart). If each of these blocks is operated as a phased 
array, they can all be digitally pointed to a generic point 
f in the sky and will be sensitive only to signals from 
within about an angle ~ 1/n around this point. As 
long as n ^ 1, the flat sky approximation can be used for 
this patch of sky, and one can use standard radio astron- 
omy procedures to combine all digital pointings towards 
that sky patch in a uv plane defined as the Fourier dual 
of the tangent plane to r (extended of order A9 around 
f). Note that in this way, the uv plane fixed to the sky 
rather than to the plane of the instrument, and is used 
to represent a small patch. 

With an omniscope consisting of many such nxn 
blocks, one can follow exactly the same procedure, except 
that one simultaneously obtains different pointings of 
the antenna blocks. In practice one would divide the sky 
into a number of small patches ( "facets" ) over which the 
flat sky approximation is valid and use the Fourier ba- 
sis to describe the intensity in each patch. Each of the 
pointings of the omniscope would then be assigned to the 



uv plane of one of these patches. 




FIG. 8: The antenna layout (top loft) and 2D baseline dis- 
tribution (top right) of nine 40 x 40 antenna blocks appear 
deformed (bottom panel) when viewed from a sky direction 
other than the zenith (here the elevation is 30° and the az- 
imuth is 63°). 




FIG. 9: To make the flat sky approximation as accurate as 
possible within a fixed number of pixels, it is helpful to make 

the pixels hexagonal as in this equal-area icosahedron-based 
scheme [30] (the pixel centers are shown as dots). 

As an illustration, consider the following specific ex- 
ample. We have nine nxn antenna blocks with n = 40, 
arranged as in Figure 8. Since the primary beam size of 
each such block will be a few degrees, we partition the 
sky into 6252 hexagonal pixels of roughly this size using 
the pixclization method of [30] as illustrated in Figure 9: 
each point in the figure corresponds to a pixel center, 
and each point (unit vector r) belongs to the pixel whose 
center it is closest to. This makes generic pixels hexago- 
nal, which has the advantage of minimizing sky curvature 
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effects [30]. For eaeh pixel center, we keep track of an as- 
sociated uv plane. Note that the sky pixelization is fixed 
in the sense that it does not change as the Earth rotates. 
We now process the omniscope data as follows: 

1. Collect data for say 1 second. 

2. FFT the data from each antenna in the time do- 
main and calibrate as necessary. 

3. For a given frequency bin, arrange the data in a 
hypercube as in Section II. 

4. Perform a two-dimensional FFT for each of the 
blocks, thereby obtaining digital pointings in 32 x 
32 different sky directions r^, z = 1, 1024. 

5. Perform the correlation in all remaining dimensions 
using a pair of FFT's as in Section II. 

6. Merge the data from each such pointing r,; into the 
uv plane corresponding to the closest pixcil center, 
bearing in mind that the baseline distribution must 
be appropriately rotated and shortened by a fac- 
tor cos 6 = z -Ti to reflect the way the array looks 
viewed from the direction fj, and that the polariza- 
tion vector must be correspondingly rotated. The 
phases must also be adjusted to reflect the transla- 
tion from each pixel center to . For the pointing 
direction f j from which the array looks deformed 
like in Figure 8 (bottom left), the corresponding 
uv weight function thus gets augmented by the cor- 
responding deformed baseline distribution (bottom 
right). 

7. Repeat for all frequency bins of interest. 

8. Repeat for as long an integration as desired, ex- 
ploiting that the pointing directions in celestial 
coordinates rotate with Earth. 

9. FFT the data in each of the 6252 uv planes to ob- 
tain maps of the corresponding sky patches, and 
FFT the corresponding uv weight functions to ob- 
tain the corresponding synthesized beam functions. 

10. Seemlessly merge all the patches into a single map. 

It is interesting to compare this approach (for observa- 
tions in a single uv plane) with what [32] term "software 
holography". The similarity is that in both cases, each 
measured visibility is accumulated in the ut;-plane not 
as a delta function (by adding the measured complex 
number to a single uv pixel), but rather as the Fourier 
transform of the instantaneous primary beam (corrected 
for deformation and translation as in Figure 8), adding in 
this uv image multiplied by the measured complex num- 
ber. In both cases, it is important that the uv plane 
where the visibilities and weight functions are accumu- 
lated are are well oversampled, to adequately resolve the 



antenna shapes. The difference is that software holog- 
raphy docs this operation once for every sample (say 
once every nanosecond), whereas with our omniscope ap- 
proach it is sufficient to do this only once per snapshot 
(say every second). This omniscope approach thus en- 
ables major computational savings, since all that needs 
to happen for every sample is the much simpler FFT- 
processing, where no oversampling is needed and each 
measured number is inserted in only one place (in the 
hypercube) . 

Perhaps the only subtlety in the omniscope mapmak- 
ing algorithm above relates to the splitting of the sky into 
domains and the combination of the different domains 
into a single final map. In particular, a given pointing of 
the omniscope could land on the boundary between do- 
mains. We can treat this by making the domains overlap 
as follows. First write 

y = Ax + n = PFDx + n, (A3) 

where we have decomposed the A matrix into several 
pieces. First D splits the sky into domains. Note that 
we can decide to assign each pixel on the sky to more 
than one domain. For example we can choose to make 
the domains overlap in such a way that whenever the cen- 
ter of a pointing of the omniscope falls in one domain, we 
will assign the entire primary beam to that domain. As 
a result, the domains will have to overlap as a pointing 
whose center is near the edge of a domain "spill" into 
neighboring ones. Rather than assigning measurements 
of each pointing to more than one patch, we simply make 
the patches overlap and assign each pointing to only one 
patch. As a result the D-matrix is not a square matrix. 
If for example we choose to have each pixel on the sky 
belong to three domains, D is a 3m x m matrix. The 
matrix F simply changes basis to describe the intensity 
in each domain by its Fourier transform, that is to say 
it goes from angle space to the uv plane. Finally, each 
visibility is given by the convolution of the Fourier modes 
in a given patch with the primary beam of the tile, en- 
coded in the matrix P. As described above, the projected 
separation of each pair of antennas and the location of 
the center of the pointing relative to the center of the 
sky patch arc needed to compute P. Note also that the 
size and shape of the primary beam also depends on the 
direction of the pointing. 

Now the estimate of the sky becomes 

X = MD*F*P*N- V, (A4) 

where P* accumulates the visibility measurements in the 
individual uv planes fixed to the sky. After all the mea- 
surements were accumulated in each domain, F* goes 
back to angle space from Fourier space. Finally, D* com- 
bines the measurements in all the domains into a single 
sky map. 
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